17 research outputs found
Inside the class of REGEX Languages
We study different possibilities of combining the concept of homomorphic replacement with regular expressions in order to investigate the class of languages given by extended regular expressions with backreferences (REGEX). It is shown in which regard existing and natural ways to do this fail to reach the expressive power of REGEX. Furthermore, the complexity of the membership problem for REGEX with a bounded number of backreferences is considered
On the membership problem for pattern languages and related topics
In this thesis, we investigate the complexity of the membership problem for pattern languages. A pattern is a string over the union of the alphabets A and X, where X := {x_1, x_2, x_3, ...} is a countable set of variables and A is a finite alphabet containing terminals (e.g., A := {a, b, c, d}). Every pattern, e.g., p := x_1 x_2 a b x_2 b x_1 c x_2, describes a pattern language, i.e., the set of all words that can be obtained by uniformly substituting the variables in the pattern by arbitrary strings over A. Hence, u := cacaaabaabcaccaa is a word of the pattern language of p, since substituting cac for x_1 and aa for x_2 yields u. On the other hand, there is no way to obtain the word u' := bbbababbacaaba by substituting the occurrences of x_1 and x_2 in p by words over A.
The problem to decide for a given pattern q and a given word w whether or not w is in the pattern language of q is called the membership problem for pattern languages. Consequently, (p, u) is a positive instance and (p, u') is a negative instance of the membership problem for pattern languages. For the unrestricted case, i.e., for arbitrary patterns and words, the membership problem is NP-complete. In this thesis, we identify classes of patterns for which the membership problem can be solved efficiently.
Our first main result in this regard is that the variable distance, i.e., the maximum number of different variables that separate two consecutive occurrences of the same variable, substantially contributes to the complexity of the membership problem for pattern languages. More precisely, for every class of patterns with a bounded variable distance the membership problem can be solved efficiently. The second main result is that the same holds for every class of patterns with a bounded scope coincidence degree, where the scope coincidence degree is the maximum number of intervals that cover a common position in the pattern, where each interval is given by the leftmost and rightmost occurrence of a variable in the pattern.
The proof of our first main result is based on automata theory. More precisely, we introduce a new automata model that is used as an algorithmic framework in order to show that the membership problem for pattern languages can be solved in time that is exponential only in the variable distance of the corresponding pattern. We then take a closer look at this automata model and subject it to a sound theoretical analysis. The second main result is obtained in a completely different way. We encode patterns and words as relational structures and we then reduce the membership problem for pattern languages to the homomorphism problem of relational structures, which allows us to exploit the concept of the treewidth. This approach turns out be successful, and we show that it has potential to identify further classes of patterns with a polynomial time membership problem.
Furthermore, we take a closer look at two aspects of pattern languages that are indirectly related to the membership problem. Firstly, we investigate the phenomenon that patterns can describe regular or context-free languages in an unexpected way, which implies that their membership problem can be solved efficiently. In this regard, we present several sufficient conditions and necessary conditions for the regularity and context-freeness of pattern languages. Secondly, we compare pattern languages with languages given by so-called extended regular expressions with backreferences (REGEX). The membership problem for REGEX languages is very important in practice and since REGEX are similar to pattern languages, it might be possible to improve algorithms for the membership problem for REGEX languages by investigating their relationship to patterns. In this regard, we investigate how patterns can be extended in order to describe large classes of REGEX languages
Regular and context-free pattern languages over small alphabets
Pattern languages are generalisations of the copy language, which is a standard
textbook example of a context-sensitive and non-context-free language. In this
work, we investigate a counter-intuitive phenomenon: with respect to alphabets
of size 2 and 3, pattern languages can be regular or context-free in an unexpected
way. For this regularity and context-freeness of pattern languages, we give
several sufficient and necessary conditions and improve known results
Regular and context-free pattern languages over small alphabets
Pattern languages are generalisations of the copy language,
which is a standard textbook example of a context-sensitive and noncontext-
free language. In this work, we investigate a counter-intuitive
phenomenon: with respect to alphabets of size 2 and 3, pattern languages
can be regular or context-free in an unexpected way. For this regularity
and context-freeness of pattern languages, we give several sufficient and
necessary conditions and improve known results
On multi-head automata with restricted nondeterminism
In this work, we consider deterministic two-way multi-headautomata, the input heads of which are nondeterministically initialised, i.e., in every computation each input head is initially located at some nondeterministically chosen position of the input word. This model serves as an instrument to investigate restrictednondeterminism of two-way multi-headautomata. Our result is that, in terms of expressive power, two-way multi-headautomata with nondeterminism in form of nondeterministically initialising the input heads or with restrictednondeterminism in the classical way, i.e., in every accepting computation the number of nondeterministic steps is bounded by a constant, do not yield an advantage over their completely deterministic counter-parts with the same number of input heads. We conclude this paper with a brief application of this result
Patterns with bounded treewidth
A pattern is a string consisting of variables and terminal symbols, and its language is the set of all words that can be obtained by substituting arbitrary words for the variables. The membership problem for pattern languages, i.e., deciding on whether or not a given word is in the pattern language of a given pattern is NP-complete. We show that any parameter of patterns that is an upper bound for the treewidth of appropriate encodings of patterns as relational structures, if restricted, allows the membership problem for pattern languages to be solved in polynomial time. Furthermore, we identify new such parameters
Automata with modulo counters and nondeterministic counter bounds
We introduce and investigate Nondeterministically Bounded
Modulo Counter Automata (NBMCA), which are two-way one-head automata
that comprise a constant number of modulo counters, where the
counter bounds are nondeterministically guessed, and this is the only
element of nondeterminism. NBMCA are tailored to recognising those
languages that are characterised by the existence of a specific factorisation
of their words, e. g., pattern languages. In this work, we subject
NBMCA to a theoretically sound analysis
Automata with Modulo Counters and Nondeterministic Counter Bounds
We introduce and investigate Nondeterministically Bounded Modulo Counter
Automata (NBMCA), which are two-way multi-head automata that comprise a
constant number of modulo counters, where the counter bounds are nondeterministically
guessed, and this is the only element of nondeterminism. NBMCA are
tailored to recognising those languages that are characterised by the existence of
a specific factorisation of their words, e. g., pattern languages. In this work, we
subject NBMCA to a theoretically sound analysis
Patterns with bounded treewidth
We show that any parameter of patterns that is an upper
bound for the treewidth of appropriate encodings of patterns as relational
structures, if restricted to a constant, allows the membership problem
for pattern languages to be solved in polynomial time. Furthermore, we
identify a new such parameter, called the scope coincidence degree
A polynomial time match test for large classes of extended regular expressions
In the present paper, we study the match test for extended regular expressions. We approach this NP-complete problem by introducing a novel variant of two-way multihead automata, which reveals
that the complexity of the match test is determined by a hidden combinatorial property of extended regular expressions, and it shows that
a restriction of the corresponding parameter leads to rich classes with
a polynomial time match test. For presentational reasons, we use the
concept of pattern languages in order to specify extended regular expressions. While this decision, formally, slightly narrows the scope of our
results, an extension of our concepts and results to more general notions
of extended regular expressions is straightforward